{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Protein Comparison\n",
    "\n",
    "## Objective and Prerequisites\n",
    "\n",
    "In this example, we’ll show you how to use mathematical optimization to address a Protein Comparison problem. You’ll learn how to model this problem – which involves measuring the similarities of two proteins – as a quadratic assignment problem using the Gurobi Python API and find an optimal solution to it with the Gurobi Optimizer.\n",
    "\n",
    "This model is example 29 from the fifth edition of Model Building in Mathematical Programming by H. Paul Williams on pages 290-291 and 345.\n",
    "\n",
    "This modeling example is at the advanced level, where we assume that you know Python and the Gurobi Python API and that you have advanced knowledge of building mathematical optimization models. Typically, the objective function and/or constraints of these examples are complex or require advanced features of the Gurobi Python API.\n",
    "\n",
    "**Download the Repository** <br /> \n",
    "You can download the repository containing this and other examples by clicking [here](https://github.com/Gurobi/modeling-examples/archive/master.zip). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install gurobipy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gurobipy as gp\n",
    "from gurobipy import GRB\n",
    "\n",
    "# tested with Python 3.7.0 & Gurobi 9.1.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem Description\n",
    "\n",
    "This problem is based on one problem discussed in a paper by Forrester and Greenberg (2008).\n",
    "It is concerned with measuring the similarities of two proteins. A protein can be\n",
    "represented by a graph with the acids represented by the nodes and\n",
    "the edges being present when two acids are within a threshold distance of each\n",
    "other. This graphical representation is known as the contact map of the protein.\n",
    "Given two contact maps representing proteins, we would like to find the largest\n",
    "(measured by the number of corresponding edges) isomorphic subgraphs in each\n",
    "graph. The acids in each of the proteins are ordered. We need to preserve this\n",
    "ordering in each of the subgraphs, which implies that there can be no crossovers\n",
    "in the comparison. This is illustrated in the following figure. \n",
    "\n",
    "![crossover](crossover.PNG)\n",
    "\n",
    "If $i < k$ in the contact map for the first protein then we cannot have $l < j$ in the second protein, if $i$ is to be\n",
    "associated with $j$ and $k$ with $l$ in the comparison. The following figure gives a comparison between two small contact\n",
    "maps leading to five corresponding edges.\n",
    "\n",
    "![comparison](comparison.PNG)\n",
    "\n",
    "The goal is to compare the contact maps given by the following figures.\n",
    "\n",
    "Mapping of the first protein:\n",
    "\n",
    "![map1](map1.PNG)\n",
    "\n",
    "Mapping of the second protein:\n",
    "\n",
    "![map1](map2.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Formulation\n",
    "\n",
    "The mapping of the first protein is represented by the graph $G_{1} = (N_{1},E_{1})$, and the  mapping of the second protein is represented by the  graph $G_{2} = (N_{2},E_{2})$\n",
    "\n",
    "### Sets and Indices\n",
    "\n",
    "$i,k \\in N_{1} =\\{1,2,...,9\\}$: Nodes in graph $G_{1}$ which are the acids in the first protein.\n",
    "\n",
    "$E_{1} = \\{(1,2),(2,9),(3,4),(3,5),(5,6),(6,7),(7,9),(8,9) \\}$: Edges in graph $G_{1}$.\n",
    "\n",
    "$j,l \\in N_{2} =\\{1,2,...,11\\} $: Nodes in graph $G_{2}$ which are the acids in the second protein.\n",
    "\n",
    "$E_{2} = \\{(1,4),(2,3),(4,6),(4,7),(5,6),(6,8),(7,8),(7,10),(9,10),(10,11) \\}$: Edges in graph $G_{2}$.\n",
    "\n",
    "### Decision variables\n",
    "\n",
    "$\\text{map}_{i,j} = x_{i,j} = 1$, iff node $i$ in $G_{1}$ is matched with node $j$ in $G_{2}$.\n",
    "\n",
    "$ w_{i,j,k,l} = x_{i,j}*x_{k,l}  = 1$, iff an edge $(i,k) \\in E_{1}$ is matched with edge $(j,l) \\in E_{2}$.\n",
    "\n",
    "### Constraints\n",
    "\n",
    "**$G_{1}$ matching**: No node in $G_{1}$ can be matched with more than one  in $G_{2}$.\n",
    "\n",
    "$$\n",
    "\\sum_{i \\in N_{1} } x_{i,j} \\leq 1 \\quad \\forall j \\in N_{2}\n",
    "$$\n",
    "\n",
    "**$G_{2}$ matching**: No node in $G_{2}$ can be matched with more than one  in $G_{1}$.\n",
    "\n",
    "$$\n",
    "\\sum_{j \\in N_{2} } x_{i,j} \\leq 1 \\quad \\forall i \\in N_{1}\n",
    "$$\n",
    "\n",
    "**Edge matching**: if edges $(i, k)$ and $(j, l)$ are matched then so are the corresponding nodes.\n",
    "\n",
    "$$\n",
    " w_{i,j,k,l} \\leq x_{i,j}, \\;  w_{i,j,k,l} \\leq x_{k,l} \\quad \\forall \n",
    " (i,j,k,l) \\in ijkl = \\{ i,k \\in N_{1}, j,l \\in N_{2}: (i,k) \\in E_{1},  (j,l) \\in E_{2}  \\}\n",
    "$$\n",
    "\n",
    "**No crossovers**: There can be no crossovers.\n",
    "\n",
    "$$\n",
    "x_{i,j} +  x_{k,l} \\leq 1 \\quad \\forall \n",
    "(i,j,k,l) \\in ijklx = \\{ (i,j,k,l) \\in ijkl: i < k \\in N_{1},  j > l \\in N_{2}  \\}\n",
    "$$\n",
    "\n",
    "\n",
    "### Objective function\n",
    "The objective is to maximize the number of edge matchings.\n",
    "\n",
    "$$\n",
    "\\sum_{(i,j,k,l) \\in ijkl} w_{i,j,k,l}\n",
    "$$\n",
    "\n",
    "This linear integer programming formulation of the Protein Comparison problem is in fact a linearization of a quadratic assignment formulation of this problem. With Gurobi 9.1.0, you can directly solve the quadratic assignment formulation of the Protein Comparison problem without the auxiliary variables and the logical constraints.\n",
    "\n",
    "### Objective function\n",
    "The objective is to maximize the number of edge matchings.\n",
    "\n",
    "$$\n",
    "\\sum_{(i,j,k,l) \\in ijkl} x_{i,j}*x_{k,l}\n",
    "$$\n",
    "\n",
    "### Constraints\n",
    "\n",
    "**$G_{1}$ matching**: No node in $G_{1}$ can be matched with more than one  in $G_{2}$.\n",
    "\n",
    "$$\n",
    "\\sum_{i \\in N_{1} } x_{i,j} \\leq 1 \\quad \\forall j \\in N_{2}\n",
    "$$\n",
    "\n",
    "**$G_{2}$ matching**: No node in $G_{2}$ can be matched with more than one  in $G_{1}$.\n",
    "\n",
    "$$\n",
    "\\sum_{j \\in N_{2} } x_{i,j} \\leq 1 \\quad \\forall i \\in N_{1}\n",
    "$$\n",
    "\n",
    "**No crossovers**: There can be no crossovers.\n",
    "\n",
    "$$\n",
    "x_{i,j} +  x_{k,l} \\leq 1 \\quad \\forall \n",
    "(i,j,k,l) \\in ijklx = \\{ (i,j,k,l) \\in ijkl: i < k \\in N_{1},  j > l \\in N_{2}  \\}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Input Data "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# nodes in G1\n",
    "\n",
    "nodes1 = [*range(1,10)]\n",
    "\n",
    "# edges (i,k) in G1\n",
    "\n",
    "edges1 = [(1,2),(2,9),(3,4),(3,5),(5,6),(6,7),(7,9),(8,9)]\n",
    "\n",
    "# nodes in G2\n",
    "\n",
    "nodes2 = [*range(1,12)]\n",
    "\n",
    "# edges (j,l) in G2\n",
    "\n",
    "edges2 = [(1,4),(2,3),(4,6),(4,7),(5,6),(6,8),(7,8),(7,10),(9,10),(10,11)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preprocessing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Node matching: matchings of nodes in G1 with nodes in G2\n",
    "\n",
    "list_ij = []\n",
    "\n",
    "for i in nodes1:\n",
    "    for j in nodes2:\n",
    "        tp = i,j\n",
    "        list_ij.append(tp)\n",
    "        \n",
    "ij = gp.tuplelist(list_ij)\n",
    "\n",
    "# Edge matching: matchings of edges in G1 with edges in G2\n",
    "\n",
    "list_ijkl = []\n",
    "\n",
    "for i,k in edges1:\n",
    "    for j,l in edges2:\n",
    "        tp = i,j,k,l\n",
    "        list_ijkl.append(tp)\n",
    "        \n",
    "ijkl = gp.tuplelist(list_ijkl)\n",
    "\n",
    "# No crossover \n",
    "\n",
    "list_nox = []\n",
    "\n",
    "for i,j in ij:\n",
    "    for k,l in ij:\n",
    "        if i < k and l < j:\n",
    "            tp = i,j,k,l\n",
    "            list_nox.append(tp)\n",
    "            \n",
    "nox = gp.tuplelist(list_nox)  \n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Deployment\n",
    "\n",
    "We create a model and the decision variables. The decision variables map the nodes on each graph, with the constraint that ensures that the edges of each graph are properly matched."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using license file c:\\gurobi\\gurobi.lic\n"
     ]
    }
   ],
   "source": [
    "model = gp.Model('ProteinComparison')\n",
    "\n",
    "# Map nodes in G1 with nodes in G2\n",
    "map_nodes = model.addVars(ij, vtype=GRB.BINARY, name=\"map\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**$G_{1}$ matching constraint**: No node in $G_{1}$ can be matched with more than one  in $G_{2}$.\n",
    "\n",
    "$$\n",
    "\\sum_{i \\in N_{1} } x_{i,j} \\leq 1 \\quad \\forall j \\in N_{2}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# At most one node in G1 is matched with a node in G2\n",
    "\n",
    "node1_match = model.addConstrs((gp.quicksum(map_nodes[i,j] for i in nodes1) <= 1 for j in nodes2 ) ,name='node1_match')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**$G_{2}$ matching constraint**: No node in $G_{2}$ can be matched with more than one  in $G_{1}$.\n",
    "\n",
    "$$\n",
    "\\sum_{j \\in N_{2} } x_{i,j} \\leq 1 \\quad \\forall i \\in N_{1}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# At most one node in G2 is matched with a node in G1\n",
    "\n",
    "node2_match = model.addConstrs((gp.quicksum(map_nodes[i,j] for j in nodes2) <= 1 for i in nodes1 ) ,name='node2_match')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**No crossovers**: There can be no crossovers.\n",
    "\n",
    "$$\n",
    "x_{i,j} +  x_{k,l} \\leq 1 \\quad \\forall \n",
    "(i,j,k,l) \\in ijklx = \\{ (i,j,k,l) \\in ijkl: i < k \\in N_{1},  j > l \\in N_{2}  \\}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# No crossovers\n",
    "\n",
    "no_crossover = model.addConstrs((map_nodes[i,j] + map_nodes[k,l] <= 1 for i,j,k,l in nox), name='no_crossover')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Objective function\n",
    "\n",
    "Maximize the matchings of edges in G1 with edges in G2. \n",
    "\n",
    "$$\n",
    "\\sum_{(i,j,k,l) \\in ijkl} x_{i,j}*x_{k,l}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Objective function\n",
    "\n",
    "model.setObjective(gp.quicksum(map_nodes[i,j]*map_nodes[k,l] for i,j,k,l in ijkl ) , GRB.MAXIMIZE )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)\n",
      "Thread count: 4 physical cores, 8 logical processors, using up to 8 threads\n",
      "Optimize a model with 2000 rows, 99 columns and 4158 nonzeros\n",
      "Model fingerprint: 0x22958823\n",
      "Model has 80 quadratic objective terms\n",
      "Variable types: 0 continuous, 99 integer (99 binary)\n",
      "Coefficient statistics:\n",
      "  Matrix range     [1e+00, 1e+00]\n",
      "  Objective range  [0e+00, 0e+00]\n",
      "  QObjective range [2e+00, 2e+00]\n",
      "  Bounds range     [1e+00, 1e+00]\n",
      "  RHS range        [1e+00, 1e+00]\n",
      "Found heuristic solution: objective -0.0000000\n",
      "Presolve removed 1876 rows and 17 columns\n",
      "Presolve time: 0.01s\n",
      "Presolved: 204 rows, 162 columns, 2026 nonzeros\n",
      "Variable types: 0 continuous, 162 integer (162 binary)\n",
      "\n",
      "Root relaxation: objective -6.923077e+00, 178 iterations, 0.00 seconds\n",
      "\n",
      "    Nodes    |    Current Node    |     Objective Bounds      |     Work\n",
      " Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time\n",
      "\n",
      "     0     0    6.92308    0   52   -0.00000    6.92308      -     -    0s\n",
      "H    0     0                       5.0000000    6.92308  38.5%     -    0s\n",
      "     0     0     cutoff    0         5.00000    5.00000  0.00%     -    0s\n",
      "\n",
      "Cutting planes:\n",
      "  Gomory: 1\n",
      "  Clique: 7\n",
      "  Zero half: 20\n",
      "  RLT: 17\n",
      "\n",
      "Explored 1 nodes (267 simplex iterations) in 0.09 seconds\n",
      "Thread count was 8 (of 8 available processors)\n",
      "\n",
      "Solution count 3: 5 -0 -0 \n",
      "\n",
      "Optimal solution found (tolerance 1.00e-04)\n",
      "Best objective 5.000000000000e+00, best bound 5.000000000000e+00, gap 0.0000%\n"
     ]
    }
   ],
   "source": [
    "# Verify model formulation\n",
    "\n",
    "model.write('ProteinComparison.lp')\n",
    "\n",
    "# Run optimization engine\n",
    "\n",
    "model.optimize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Maximum number of edge matches: 5\n",
      "Edge (1, 2) in G1 is mapped with edge (2, 3) in G2\n",
      "Edge (3, 4) in G1 is mapped with edge (4, 6) in G2\n",
      "Edge (3, 5) in G1 is mapped with edge (4, 7) in G2\n",
      "Edge (5, 6) in G1 is mapped with edge (7, 8) in G2\n",
      "Edge (7, 9) in G1 is mapped with edge (9, 10) in G2\n"
     ]
    }
   ],
   "source": [
    "# Output report\n",
    "\n",
    "print(f\"Maximum number of edge matches: {round(model.objVal)}\") \n",
    "\n",
    "\n",
    "for i,j,k,l in ijkl:\n",
    "    if map_nodes[i,j].x*map_nodes[k,l].x > 0.5:\n",
    "        print(f\"Edge {i,k} in G1 is mapped with edge {j,l} in G2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## References\n",
    "\n",
    "H. Paul Williams, Model Building in Mathematical Programming, fifth edition.\n",
    "\n",
    "Forrester, R.J. and Greenberg, H.J. (2008) Quadratic Binary Programming Models in Computational Biology. Algorithmic Operations Research, 3, 110–129.\n",
    "\n",
    "Copyright © 2020 Gurobi Optimization, LLC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
